13 research outputs found
GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks
In the last decade, supervised deep learning approaches have been extensively
employed in visual odometry (VO) applications, which is not feasible in
environments where labelled data is not abundant. On the other hand,
unsupervised deep learning approaches for localization and mapping in unknown
environments from unlabelled data have received comparatively less attention in
VO research. In this study, we propose a generative unsupervised learning
framework that predicts 6-DoF pose camera motion and monocular depth map of the
scene from unlabelled RGB image sequences, using deep convolutional Generative
Adversarial Networks (GANs). We create a supervisory signal by warping view
sequences and assigning the re-projection minimization to the objective loss
function that is adopted in multi-view pose estimation and single-view depth
generation network. Detailed quantitative and qualitative evaluations of the
proposed framework on the KITTI and Cityscapes datasets show that the proposed
method outperforms both existing traditional and unsupervised deep VO methods
providing better results for both pose estimation and depth recovery.Comment: ICRA 2019 - accepte
RADA: Robust Adversarial Data Augmentation for Camera Localization in Challenging Conditions
Camera localization is a fundamental problem for many applications in computer vision, robotics, and autonomy. Despite recent deep learning-based approaches, the lack of robustness in challenging conditions persists due to changes in appearance caused by texture-less planes, repeating structures, reflective surfaces, motion blur, and illumination changes. Data augmentation is an attractive solution, but standard image perturbation methods fail to improve localization robustness. To address this, we propose RADA, which concentrates on perturbing the most vulnerable pixels to generate relatively less image perturbations that perplex the network. Our method outperforms previous augmentation techniques, achieving up to twice the accuracy of state-of-the-art models even under ’unseen’ challenging weather conditions. Videos of our results can be found at https://youtu.be/niOv7- fJeCA. The source code for RADA is publicly available at https://github.com/jialuwang123321/RAD
Learning Monocular Visual Odometry through Geometry-Aware Curriculum Learning
Inspired by the cognitive process of humans and animals, Curriculum Learning
(CL) trains a model by gradually increasing the difficulty of the training
data. In this paper, we study whether CL can be applied to complex geometry
problems like estimating monocular Visual Odometry (VO). Unlike existing CL
approaches, we present a novel CL strategy for learning the geometry of
monocular VO by gradually making the learning objective more difficult during
training. To this end, we propose a novel geometry-aware objective function by
jointly optimizing relative and composite transformations over small windows
via bounded pose regression loss. A cascade optical flow network followed by
recurrent network with a differentiable windowed composition layer, termed
CL-VO, is devised to learn the proposed objective. Evaluation on three
real-world datasets shows superior performance of CL-VO over state-of-the-art
feature-based and learning-based VO.Comment: accepted in IEEE ICRA 201
OdomBeyondVision: An Indoor Multi-modal Multi-platform Odometry Dataset Beyond the Visible Spectrum
This paper presents a multimodal indoor odometry dataset, OdomBeyondVision,
featuring multiple sensors across the different spectrum and collected with
different mobile platforms. Not only does OdomBeyondVision contain the
traditional navigation sensors, sensors such as IMUs, mechanical LiDAR, RGBD
camera, it also includes several emerging sensors such as the single-chip
mmWave radar, LWIR thermal camera and solid-state LiDAR. With the above sensors
on UAV, UGV and handheld platforms, we respectively recorded the multimodal
odometry data and their movement trajectories in various indoor scenes and
different illumination conditions. We release the exemplar radar,
radar-inertial and thermal-inertial odometry implementations to demonstrate
their results for future works to compare against and improve upon. The full
dataset including toolkit and documentation is publicly available at:
https://github.com/MAPS-Lab/OdomBeyondVision
Graph-based Thermal-Inertial SLAM with Probabilistic Neural Networks
Simultaneous Localization and Mapping (SLAM) system typically employ
vision-based sensors to observe the surrounding environment. However, the
performance of such systems highly depends on the ambient illumination
conditions. In scenarios with adverse visibility or in the presence of airborne
particulates (e.g. smoke, dust, etc.), alternative modalities such as those
based on thermal imaging and inertial sensors are more promising. In this
paper, we propose the first complete thermal-inertial SLAM system which
combines neural abstraction in the SLAM front end with robust pose graph
optimization in the SLAM back end. We model the sensor abstraction in the front
end by employing probabilistic deep learning parameterized by Mixture Density
Networks (MDN). Our key strategies to successfully model this encoding from
thermal imagery are the usage of normalized 14-bit radiometric data, the
incorporation of hallucinated visual (RGB) features, and the inclusion of
feature selection to estimate the MDN parameters. To enable a full SLAM system,
we also design an efficient global image descriptor which is able to detect
loop closures from thermal embedding vectors. We performed extensive
experiments and analysis using three datasets, namely self-collected ground
robot and handheld data taken in indoor environment, and one public dataset
(SubT-tunnel) collected in underground tunnel. Finally, we demonstrate that an
accurate thermal-inertial SLAM system can be realized in conditions of both
benign and adverse visibility.Comment: Accepted to IEEE Transactions on Robotic
VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization
Recent learning-based approaches have achieved impressive results in the field of single-shot camera localization. However, how best to fuse multiple modalities (e.g., image and depth) and to deal with degraded or missing input are less well studied. In particular, we note that previous approaches towards deep fusion do not perform significantly better than models employing a single modality. We conjecture that this is because of the naive approaches to feature space fusion through summation or concatenation which do not take into account the different strengths of each modality. To address this, we propose an end-to-end framework, termed VMLoc, to fuse different sensor inputs into a common latent space through a variational Product-of-Experts (PoE) followed by attention-based fusion. Unlike previous multimodal variational works directly adapting the objective function of vanilla variational auto-encoder, we show how camera localization can be accurately estimated through an unbiased objective function based on importance weighting. Our model is extensively evaluated on RGB-D datasets and the results prove the efficacy of our model. The source code is available at https://github.com/Zalex97/VMLoc